CD Quality Fairy-Tale

SacRat

Less than ten years ago when the MP3 format was quite exotic, the promised high compression (about 10 times) without quality loss seemed to be real. Actually, in those days ADPCM was one of the few methods to store sound with rather low quality (and huge size), much lower than the CD one. That's why MP3 was a kind of shock. As Fraunhofer announced, a 128 Kb/s MP3 file would sound like original CD. With the computer sound cards and audio systems of that time, there was no difference.

As time passed by, home PCs received more versatile sound cards, cheap speakers were replaced with stereo systems in many cases, and the quality of the stereo systems themselves had grown tremendously. It came out that the "classic" 128 Kb/s bitrate was not enough to store high-quality sound. But why?

Most of the files encoded at "classic" bitrate contain sound frequencies of up to 16-17 kHz. In practice, many people are able to hear the 19-20 kHz sounds (this number decreases with the age) but in fact absence of these frequencies does not mean much to us, and in addition, the most important frequency range for our ears is 1 to 4 kHz.

You can try an experiment. Just remove all 17 kHz and above sounds from the sample with duration of 10 to 30 seconds. (You can use CoolEdit to achieve this, for example). Now you are able to do the ABX test. The meaning of this test is to listen to two sound fragments ("A" and "B"), and then compare them to fragment "X", which is randomly selected from among the ones mentioned above. The amount of correct comparisons defines ability/inability to hear the difference between fragments. Even if you hear wide range and have a good sound system (most of the standard systems on the market have problems with high frequencies), the difference that you will probably notice will hardly cause disgust.

Artifacts. Miscellaneous distortions and artifacts are the real problem of the low-bitrate records. The most noticeable kind of artifacts is noise that appears during the encoding of some types of sounds (applause, for example). The background noise, which is similar to the sound of running water, is so noticeable that you can determine whether the file has been compressed.

Distortion of stereo panorama. This artifact can be noticed when you're using headphones and it is often caused by the imperfect Joint Stereo model when two channels with different information (e.g. guitar on the right and drums or piano on the left) are encoded together. When you're using speakers standing rather close to each other, you may not hear the difference but when the distance between them is more optimal, the effect is quite noticeable.

Pre-echo. This is an artifact that is common for all modern "lossy" encoders. The effect is in appearing an echo before the signal itself can be heard. One can notice it on fragments with lots of "hihats" and "cymbals". In spite of the fact that this effect is present even at high bitrates, it is hardly noticeable when the music is played as a background (i.e. at low volume).

Sometimes other effects and distortions may appear (some alien sounds, etc). Please remember that miscellaneous encoders can bring miscellaneous effects into the sound. The classic example is Xing encoder, which is the leader in the encoding speed as well as in the amount of artifacts.

In order to hear the difference between a 128 Kb/s MP3 file and original wave one grabbed from a CD it is not important to have "golden ears". You just should hold an ABX test. Anyone who tried it once will never believe that two files sound identical.

Is situation so bad? Not completely. Actually, the difference can be less noticeable than it is in theory. It all depends on the testing person, audio system, and the music itself. Some fragments seem to be identical even after 128 Kb/s MP3 encoding. Nevertheless, one can say for sure that these classic-bitrate MP3 files are usually too different from the original, and the term "CD quality" is not applicable to them.

In order to increase the sound quality of an output file one should increase the bitrate, and, as a result, the size of a file. It is like a vicious circle: the file is encoded in order to reduce its size but the quality is decreased along with it since a "lossy" algorithm is used. It is a difficult question what MP3 bitrate value is enough for the so-called "transparent" loseless encoding (i.e. when the original and encoded files sound the same). Some people prefer to encode files at the highest bitrate possible with MP3 format (that is 320 Kb/s). In this case, the 1:4 compression rate can be achieved, and the files are rather huge. The quality level is so high that even experts who use hi-fi systems cannot find the difference in most cases (although occasionally it happens due to peculiarities of the MP3 format). The common question is how to make the output file smaller but without noticeable quality loss.

Of course it should be done via decreasing a bitrate, i.e. the amount of information being transferred. But what bitrate value is optimal? According to the tests, 256 Kb/s is enough for CD-like quality. On the other hand, in some cases you can use values of 192 Kb/s, 160 Kb/s, and less. What things occupy the greatest part of the data stream? The problem is that when a constant bitrate is used, each sound sample has exactly the same length, i.e. one second of silence and one second of classical music will use the same amount of bits. In the first case, the selected bitrate will obviously be superfluous; in the second case, it may be insufficient. The bit reservoir technology allows using more bits to encode complicated sound samples but the range where it can be used is very narrow.

The VBR (variable bitrate) technology was designed for more reliable encoding. It allows adapting the bitrate to the type of music. In case of MP3 files, a standard set of values from the range of 32 to 320 Kb/s is used (32/48/56/64/96/112/128/160/192/224/256/320). The VBR encoding is more accurate: simple music fragments are encoded at low bitrates while more complicated ones are encoded at higher bitrates. The obvious problem here is unknown output file size, and, thus, low availability for Internet broadcasting. The ABR (average bitrate) technology is an intermediate between CBR (constant bitrate) and VBR. It allows a higher quality than CBR along with almost the same file sizes. But in most cases, the usage of VBR is preferable.

It is important to note that not only MP3 uses VBR but OGG Vorbis, Advanced Audio Coding, and Mpeg Plus use it as well.

Looking back at the question of the minimal bitrate value and maximum quality one should note that the best way is to use VBR technology for sound compression. At the current moment, there are several encoders that support it. Experts consider LAME to be one of the most successful encoders. The latest versions of it have the so-called "presets", i.e. predefined settings. In most cases, the r3mix or alt-preset standard should be used. Output file bitrate will vary from 170 to 200 Kb/s in this case.

Recently, after Microsoft released the newest, 9th, version of the WMA (Windows Media Audio) codec, it started to advertise the format in the public. A CD quality at 64Kb/s and less, along with VBR and loseless compression support were announced, although these ones are already implemented in other codecs. But ideal sound quality at 64Kb/s is hardly possible, and in practice, usually it's just an advertising trick. Independent testing (please see http://ff123.net for details) of different codecs at this bitrate has brought the following result: WMA8 was the worst among all the tested encoders, and it lost many points to commercial MP3Pro and free OGG. A practice of my own just confirmed that testing. WMA-encoded files have significant sound distortions even at higher bitrates.


To be continued...


Please send your comments, wishes, and the rest of the stuff to:
Taras Brizitsky [alias SacRat].


Translated into English by Sergei Dubarev aka H.A.


SacRat